Barra, a Parallel Functional GPGPU Simulator

نویسندگان

  • Sylvain Collange
  • David Defour
  • David Parello
چکیده

We present a GPU functional simulator targeting GPGPU based on the UNISIM framework which takes unaltered NVIDIA CUDA executables as input. It simulates the native instruction set of the Tesla architecture at the functional level and generates detailed execution statistics. Simulation speed is competitive with the less-accurate CUDA emulation mode thanks to optimizations which exploit the inherent parallelism of GPGPU applications. Furthermore, it opens the way for GPU microarchitecture design space exploration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Barra, a Modular Functional GPU Simulator for GPGPU

The use of GPUs for general-purpose applications promises huge performance returns for a small investment. However the internal design of such processors is undocumented and many details are unknown, preventing developers to optimize their code for these architectures. One solution is to use functional simulation to determine program behavior and gather statistics when counters are missing or u...

متن کامل

GPGPU-Accelerated Instruction Accurate and Fast Simulation of Thousand-core Platforms

Future architectures will feature hundreds to thousands of simple processors and on-chip memories connected through a network-on-chip. Architectural simulators will remain primary tools for design space exploration, performance (and power) evaluation of these massively parallel architectures. However, architectural simulation performance is a serious concern, as virtual platforms and simulation...

متن کامل

FATSEA – An Architectural Simulator for General Purpose Computing on GPUs

We present FATSEA, a functional and performance evaluation simulator written in C++ to handle kernels written in the CUDA programming language aimed for GPGPU computing. FATSEA takes a Parallel Thread eXecution (PTX ) code as input, which is a device independent code format generated by the Nvidia CUDA compiler, to validate results and estimate performance on Nvidia platforms. This paper shows ...

متن کامل

Fault injection on GPGPU application

Today, with the development of GPU computing techniques in terms of architectures and hardware and software support, people realized that intensive computing workload could be ported to GPU device. Applications could exploit GPUs’ characteristics for parallel computing and gain a significantly high speedup comparing to CPU architecture. However, failures are still unavoidable. People have alrea...

متن کامل

A complete and efficient CUDA-sharing solution for HPC clusters

In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009